Goto

Collaborating Authors

 adversarial distributional training


Adversarial Distributional Training for Robust Deep Learning

Neural Information Processing Systems

Adversarial training (AT) is among the most effective techniques to improve model robustness by augmenting training data with adversarial examples. However, most existing AT methods adopt a specific attack to craft adversarial examples, leading to the unreliable robustness against other unseen attacks. Besides, a single attack algorithm could be insufficient to explore the space of perturbations. In this paper, we introduce adversarial distributional training (ADT), a novel framework for learning robust models. ADT is formulated as a minimax optimization problem, where the inner maximization aims to learn an adversarial distribution to characterize the potential adversarial examples around a natural one under an entropic regularizer, and the outer minimization aims to train robust models by minimizing the expected loss over the worst-case adversarial distributions. Through a theoretical analysis, we develop a general algorithm for solving ADT, and present three approaches for parameterizing the adversarial distributions, ranging from the typical Gaussian distributions to the flexible implicit ones. Empirical results on several benchmarks validate the effectiveness of ADT compared with the state-of-the-art AT methods.


Review for NeurIPS paper: Adversarial Distributional Training for Robust Deep Learning

Neural Information Processing Systems

Thank you for your submission to NeurIPS. After discussion, the reviewers are all in agreement that the proposed method does present an interesting and significant addition to the literature on adversarial training. The one criticism that the reviewers raised, that the method did not compare to the current state of the art in standard adversarial training, was well-addressed by the author response, and I'd strongly encourage them to include these results in the final version.


Review for NeurIPS paper: Adversarial Distributional Training for Robust Deep Learning

Neural Information Processing Systems

Additional Feedback: I thought the method was very cool! One thing that I thought the paper was doing (which turned out to be a misunderstanding, I think) is relaxing the l2 adversarial constraint bit. This is more of an intuition (and did not affect my review in any way), but to some extent is seems like if what one cares about is L2-adversarial robustness, then maximizing the inner loss with PGD is in some sense going to be "optimal"/hard-to-beat (some results in the Madry et al paper corroborate this, few-step PGD is pretty good at finding the best maxima we can find in general.) On the other hand, what you have is a weaker adversary (the distributional one entropic regularizer), but it has the advantage of being a potentially structured way of enforcing a better constraint than L2 robustness. Again this isn't part of my review, but it would be cool to see if it is possible to define a new robustness constraint that is explicitly tailored to your learned adversary (e.g.


Adversarial Distributional Training for Robust Deep Learning

Neural Information Processing Systems

Adversarial training (AT) is among the most effective techniques to improve model robustness by augmenting training data with adversarial examples. However, most existing AT methods adopt a specific attack to craft adversarial examples, leading to the unreliable robustness against other unseen attacks. Besides, a single attack algorithm could be insufficient to explore the space of perturbations. In this paper, we introduce adversarial distributional training (ADT), a novel framework for learning robust models. ADT is formulated as a minimax optimization problem, where the inner maximization aims to learn an adversarial distribution to characterize the potential adversarial examples around a natural one under an entropic regularizer, and the outer minimization aims to train robust models by minimizing the expected loss over the worst-case adversarial distributions.


Adversarial Distributional Training for Robust Deep Learning

Deng, Zhijie, Dong, Yinpeng, Pang, Tianyu, Su, Hang, Zhu, Jun

arXiv.org Machine Learning

Adversarial training (AT) is among the most effective techniques to improve model robustness by augmenting training data with adversarial examples. However, the adversarially trained models do not perform well enough on test data or under other attack algorithms unseen during training, which remains to be improved. In this paper, we introduce a novel adversarial distributional training (ADT) framework for learning robust models. Specifically, we formulate ADT as a minimax optimization problem, where the inner maximization aims to learn an adversarial distribution to characterize the potential adversarial examples around a natural one, and the outer minimization aims to train robust classifiers by minimizing the expected loss over the worst-case adversarial distributions. We conduct a theoretical analysis on how to solve the minimax problem, leading to a general algorithm for ADT. We further propose three different approaches to parameterize the adversarial distributions. Empirical results on various benchmarks validate the effectiveness of ADT compared with the state-of-the-art AT methods.